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Abstract 

Four  main  results  are  arrived  at  in  this  paper.  (1)  Closed  convex 
sets  of  classical  probability  functions  provide  a  representation  of  belief 
that  includes  the  representations  provided  by  Shafer  probability  mass 
functions  as  a  special  case.  (2)  The  impact  of  "uncertain  evidence" 
can  be  (formally)  represented  by  Dempster  conditioning,  in  Shafer's 
framework.  (3)  The  impact  of  "uncertain  evidence"  can  be  (formally ) 
represented  in  the  framework  of  convex  sets  of  classical  probabilities 
by  classical  conditionalization.  (4)  The  probability  intervals  that  result 
from  Dempster/Shafer  updating  on  uncertain  evidence  are  included 
in  (and  may  be  properly  included  in)  the  intervals  that  result  from 
Bayesian  updating  on  uncertain  evidence. 
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BAYESIAN  AND  NON-BAYES IAN  EVIDENTIAL  MEASURES* 


1.  Recent  work  in  both  vision  systems  (Garvey,  Wesley)  and 
in  knowledge  representation  (Lowrance,  Barnett,  Quinlan,  Dillard) 
has  employed  an  alternative,  often  referred  to  as  Dempster/Shaf er 
updating,  to  classical  Bayesian  updating  of  uncertain  knowledge. 
Various  other  investigators  have  gone  beyond  classical  Bayesian 
conditionalization  (MYCIN,  EMYCIN,  DENDRAL ,  ...)  but  in  a  less 
systematic  manner.  It  is  appropriate  to  examine  tne  formal  relations 
between  various  Bayesian  and  non-Bayesian  approaches  to  what  has 
come  to  be  called  evidence  theory,  in  order  to  explore  the  question 
of  whether  the  new  techniques  are  really  more  powerful  than  the  old, 
and  the  question  of  whether,  if  they  are,  this  increment  of  power 

is  bought  at  too  high  a  price. 

2.  Classical  probability  theory  supposes  (1)  that  we  commence 
with  known  statistial  distributions,  (2)  that  these  distributions 
are  such  as  to  give  rise  to  real-valued  probabilities,  and  (3)  that 
these  probabilities  can  be  modified  by  using  Bayes'  theorem  to 
conditionalize  on  evidence  that  is  taken  to  be  certain.  There  are 
thus  three  ways  to  modify  the  classical  theory. 

We  may  dispense  with  the  supposition  that  we  are  dealing  with 
known  statistical  distributions.  The  best  known  advocate  of  this 
gambit  was  L.  J.  Savage,  who  argued  that  probabilities  represent 
personal,  subjective,  opinions,  and  not  objective  distributions 
of  quantities  in  the  world.  This  approach  has  given  rise  to  Bayesian 


statistics,  based  on  the  fact  that  the  opinions  of  most  people  are 
such  that,  faced  with  frequency  data,  they  will  converge  reasonably 
rapidly.  Furthermore,  in  practice,  it  is  common  to  recognize  that 
some  opinions  are  better  than  others,  and  to  use  as  prior  distri¬ 
butions  in  statistical  inference  distributions  representing  the 
opinions  of  knowledgeable  experts.  This  approach  has  been  incor¬ 
porated  in  some  expert  systems,  for  example,  PROSPECTOR.  It  has 
both  virtues  and  limitations.  A  purely  pragmatic  virtue  is  that  it 
allows  us  to  get  on  with  our  business  even  when  we  don' t  have  the 
knowledge  of  prior  distributions  we  would  like  to  have.  It  has  the 

practical  virtue  that  the  considered  opinions  of  genuinely  knowledge¬ 
able  experts  are  formed  in  response  to,  and  reflect  with  some  degree 

of  accuracy,  relative  frequencies  in  nature.  But  it  has  two  draw¬ 
backs:  it  does  not  incorporate  any  indication  of  whether  the  opinion 

is  a  wild  guess,  or  a  considered  judgement  based  on  long  experience; 
and  it  calls  for  expert  opinions  even  in  the  face  of  total,  acknowledged 
ignorance . 

This  suggests  the  second  departure  from  the  classical  picture; 
abandoning  the  assumption  that  our  probabilities  are  point- valued . 

This  has  recently  been  hailed  as  a  novel  departure  (lowrance,  1982, 
p.  21;  Garvey,  et.  al.,  1981,  p.  319;  Dillard,  1982,  p.  1;  Lowrance 
and  Garvey,  1982,  p.7;  Wesley  and  Hanson,  1982,  p.  16;  Quinlan,  J82, 
p.  9).  The  idea  of  representing  probabilities  by  intervals  is  not 
new  (cf.  Kyburg,  Good,  Levi,  Smith),  and  the  notion  of  probabilities 
that  constitute  a  field  richer  than  that  of  the  real  numbers  goes 
back  even  further  (Keynes,  192i,  offers  a  formal  ph : iosophic ai 
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treatment  of  such  entities;  B.O.  Koopman,  1941,  1942,  offers  a 
mathematical  characterization).  Even  the  standard  subjectivistic 
or  personalise  view  of  probability  can  be  construed  in  this  way; 
while  each  person  has  a  set  of  real  valued  probabilities  defined 
over  a  given  field,  a  group  of  people  will  reflect  a  set  of  proba¬ 
bility  functions  defined  over  the  field.  We  may  quite  resonably 

focus  our  attention  on  the  supremum  and  infiaum  of  these  functions 

1 

evaluated  at  a  member  of  the  field. 

In  general  the  representation  in  terms  of  intervals  seems  superior 
to  the  representation  in  terms  of  point  values.  Even  in  the  ideal  case, 
in  which  all  of  our  measures  are  based  on  statistical  inference  from 
suitably  massive  quantities  of  data,  it  is  most  natural  to  construe 
these  measures  as  being  constrained  by  intervals.  In  confidence 
interval  estimation,  for  example,  what  we  get  from  our  statistics 
is  a  high  confidence  that  a  given  parameter  is  contained  in  a  certain 
interval.  This  translates  neatly  and  conveniently  into  an  interval 
constraint.  The  results  of  statistical  inference  should  reflect 
interterminacy  or  vagueness.  What  we  can  properly  claim  to  know  is 
not  that  a  parameter  has  a  certain  value,  but  that  it  lies  within 
certain  limits.  This  limitation  of  human  knowledge  should  surely 
be  mirrored  in  computer  based  systems. 

The  third  departure  from  the  classical  scheme  is  to  consider 
alternatives  to  Bayes'  theorem  as  a  way  of  updating  probabilities  in;r_ 
the  light  of  new  evidence.  This  departure  is  recent,  and  was  first 
stated  in  Dempster,  1967.  Dempster's  novel  rule  of  combination,  _ _ 


;  l  ; i  r  lout.  I  on/ 

liability  Codes 
’  .Avail  and/or 


.Diet  |  Special 


l/H 


□  □ 


subsequently  adopted  by  Shafer  (1976),  is  often  referred  to  as  a 
"generalization”  of  Bayesian  inference  (Lowrance  and  Garvey,  1982, 
p.  9:  "Dempster's  rule  can  be  viewed  as  a  direct  generalization 
of  Bayes'  rule  . . ;  Dillard,  1982,  p.l;  Garvey,  et.  al .  ,  p.  319; 
Lowrance  1982,  p.  21).  This  suggests,  on  the  one  hand,  that  Bayes* 
rule  can  be  regarded  as  a  special  or  limiting  case  of  Dempster's  rule, 
which  is  true,  and  on  the  other  hand  that  Dempster's  rule  can  be  applied 
where  Bayes'  rule  cannot,  which  is  false.  Dempster  himself 
recognizes  (1967,  1968)  that  his  rule  results  from  the  imposition 

of  additional  constraints  on  the  Bayesian  analysis  (see  note  4). 

One  very  serious  problem  with  the  usual  Bayesian  approach  to 
evidential  updating  is  the  quantity  of  information  that  must  be 
embodied  in  the  probability  function  covering  the  field  of  proposi¬ 
tions  with  which  we  are  concerned.  This  may  be  empirical  information 
(if  the  underlying  probabilities  are  thought  of  as  being  based  on 
statistical  knowledge),  psychological  information  (if  a  personalistic 
interpretation  of  probability  is  adopted),  or  logical  information 
(if  we  interpret  probability  as  degree  of  confirmation,  a  la  Carnap 
(1951)).  Suppose  we  consider  a  field  of  propositions  based  on  the 
logically  independent  propositions  £  .  .  .p  the  set  of  what  Carnap 
called  "state  descriptions”  induced  by  this  basis  consists  of  2^ 
atoms,  each  of  which  is  the  conjunction  of  the  n  (negated  or  unnegated) 
p^.  It  is  obvious  that  for  reasonably  large  n  this  assignment  of 
probabilities  presents  great  difficulties.  But  once  we  have  those  2— 
numbers,  we’re  lone  -  we  can  calculate  all  conditional  probabilities  as 
well  as  the  probability  of  any  proposition  in  the  field  based  on  £. . . -P 


Is  there  a  saving  in  effort  if  we  go  to  a  Dempster/Shafer 
System?  Using  the  handy  representation  in  Shafer  (1976),  we  take 
9,  the  universal  set,  to  be  the  set  of  all  2n  possibilities  repre¬ 
sented  by  the  state  descriptions,  and  assign  a  mass  to  each  subset 
of  9.  This  requires  2  exp  2—  assignments!  As  far  as  the  number 
of  parameters  to  be  taken  account  of  is  concerned,  we  are  exponen¬ 
tially  worse  off.  But  if  we  construe  probabilities  as  intervals,  or 
represent  them  by  convex  sets  of  simple  probability  functions,  we 
are  just  as  badly  off.  (For  an  example  relating  mass  assignments 
to  interval  assignments,  see  table  I  in  the  appendix.  For  the  general 
equivalence,  see  theorem  1  below.)  Dillard  (p.  4)  refers  to 
"computational  limitations1'  and  Lovrance  and  Carvey  (1982)  mention 
that  with  large  6,  maintaining  the  model  is  "computationally 
infeasible." 

In  either  case,  we  need  to  find  some  systematic  and  computa¬ 
tionally  feasible  procedure  for  obtaining  the  masses  or  proba¬ 
bilities  we  need.  3ayesian  and  non-Bavesian  approaches  are  in 
essentially  the  same  difficult  situation  in  this  respect,  although 
there  are  often  plausible  ways  of  systematizing  the  parameter  assignments 
on  either  view. 

3.  Whether  the  representation  of  our  initial  knowledge  state  is  given 
by  an  assignment  of  masses  to  subsets  of  m  or  by  a  set  of  classical 
probability  distributions  over  the  atoms  of  ?,  it  is  important  that 
these  masses  or  probabilities  by  justifiable.  As  already  suggested, 
the  most  s t r a  i;h t - f o rva r d  wav  of  obtain  in;  the~  is  through  statistical 


inference,  which  (when  possible)  yields  interval  valued  estimates 

of  relative  frequencies.  But  there  may  also  be  other  ways  to  obtain 

masses  or  intervals  of  probability.  If  so,  then  the  deep  and  difficult 

problem  arises  of  how  to  combine  both  statistical  and  non-statistical 

2 

sources  of  information. 


It  has  been  suggested  that  Dempster/Shafer  updating  relieves 
us  of  the  necessity  of  making  assumptions  about  the  joint  probabilities 
of  the  objects  we  are  concerned  about.  Thus,  Quinlan  claims  that 
INFERNO  "makes  no  assumptions  whatever  about  the  joint  probability 
distributions  of  pieces  of  knowledge  .  .  . ’’  (Quinlan  1982).  Other 
writers  have  made  similar  claims  —  e.g.,  Wesley  and  Hanson,  1982,  p.  15. 
(To  make  independence  assumptions  is  exactly  to  make  assumptions  about 
joint  probability  distributions.) 

It  is  clear  that  the  assignment  of  masses  to  subsets  of  9  involves 
just  as  much  in  the  way  of  "assumptions"  as  the  assignment  of  a  priori 
probabilities  to  the  corresponding  propositions.  In  view  of  the 
reducibility  of  the  Dempster/Shafer  formalism  to  the  formalism  provided 
by  convex  sets  of  classical  probability  functions  (to  be  shown  below), 
moreover,  we  may  recapture  the  assumptions  about  joint  probability 
distributions  from  the  convex  Bayesian  representation. 


4.  Cne  important  novelty  of 

its  ability  to  handle  uncertain  evi 
itself  ant i-Bayes ian .  There  are  a. 
uncertain  evidence.  One  ot  these, 
bv  Lowrance  (1952,  p.  !■)  is  known 
Jeffrey's  rule.  ‘it  is  presented  a 


the  Dempster/Schafer  system  is 
dence.  But  even  this  is  not  in 
so  Bayesian  methods  far  handling 
used  in  PROSPECTOR  at:  mentioned 
in  the  ph ilosoph ical  world  as 
.r.d  discussed  in  left rey  .  1955  i 


It  follows  from  Saves'  theorem  mat 
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P ( A)  =  P(A/B)P(B)  +  P(A/-B)P(-B)  . 


If  you  adopC  a  new  (coherent)  probability  function  P_'  ,  there  are 
essentially  no  constraints  on  P_'  (A)  .  But  one  can  adopt  the  prin¬ 
ciple  that  if  a  shift  in  probability  originates  in  the  assignment 
of  a  new  probability  to  B_,  that  should  not  affect  the  cond it ional 
probability  of  A  given  B_:  P.(A/B)  =  P.' (A/B_)  .  We  have  learned  some¬ 

thing  new  about  B_,  but  we  haven't  learned  anything  new  about  the 
bearing  of  the  truth  of  B_  on  the  truth  of  A. 

Given  this  principle,  the  response  of  a  shift  in  the  probability 
of  B_  from  P_(B)  Co  P_'  (B)  ,  resulting  from  new  evidence,  should  pro¬ 
pagate  itself  according  to: 

?' (A)  =  P(A/B)P’(B)  +  P(A/-B)P’ (-B) 

When  new  evidence  leads  us  to  shift  our  credence  in  3_  from  £(j[) 
to  P_’ (B_)  ,  a  corresponding  shift  in  probability  is  induced  for  every 
other  proposition  in  the  field:  the  new  probability  of  a  proposition 
A  is  the  weighted  average  of  the  probability  of  A,  given  3.,  and  the 
probability  of  A  given  not-B_,  weighted  by  the  new  probabilities  of 


B_  and 

not-Ey. 

Lowranc  e 

(1932)  worries  about  the  probl 

em 

of  iterating  this 

mo  v  e . 

Having 

made  it,  should  we  then  update 

P  P. 

e  probability  of  j3 

in  c  h  e 

light 

of  the  new  probability  ?’(.A)? 

y  e  s 

ley  and  Hansen  (1932 

?  •  1-5) 

wo  r  r  y 

about  a  potential  "violation  of 

yes'  Law.’’  But  what 

is  offered  is  not  a  relaxation  method:  it  is  a  method  of  evaluating 


the  impact  of  evidence  which  warrants  a  shift  in  the  support  for  B. 

It  makes  no  sense  to  consider  updating  (ji)  in  the  light  of  the 
new  value  of  P_(A)  ;  P_' (B_)  is  the  source  of  the  updating.  No  contra¬ 

diction  lurks  here. 

Other  Bayesian  updating  procedures  are  possible  (cf.  Hartry 
Field,  1978),  but  it  is  hard  to  think  of  one  so  simple  and  so  nacural . 
This  is  particularly  true  in  the  epistemological  framework  considered 
by  Shafer;  the  weights  of  the  subsets  of  6  assigned  masses  reflect 
our  a  priori  intuitions;  there  is  no  way  in  which  the  values  of  these 
masses,  given  our  observations,  can  be  changed  without  changing  the 
model  entirely.  What  impact  given  evidence  has  should  not  also 
change  according  to  the  evidence  we  happen  to  have- 

5.  .  In  order  to  investigate  more  closely  the  relations  between 

the  Bayesian  and  Dempster/Shafer  stragegies  for  updating,  it  will 

be  helpful  to  have  several  formal  results.  In  the  present  section 

we  establish  the  partial  equivalence  between  the  assignment  of  masses 

to  subsets  of  9  (the  space  of  possibilities)  and  the  assignment  of 

a  convex  set  of  simple  classical  probability  functions  defined  over 

the  atoms  of  9.  The  equivalence  is  only  partial,  since  some  plausible 

3 

situations  do  not  have  a  representation  in  terms  of  mass  functions. 
Clhroughout  "c”  is-  to  be  understood  as  proper  or  improper  inclusion.) 

Theorem  1: 

Let  m  be  a  probability  mass  function  defined  over  a  frame  of 


3 


discernment  9.  Let  Bel(X)  be  the  corresponding  belief  function  — 

Bel (X)  =  l  m(A) .  Then  there  is  a  closed,  convex  set  of  classical 
- AeX - 

probability  functions  S_p  defined  over  the  atoms  of  9  such  chat  for 

every  subset  X  of  9  ,  3 e  1  ( X )  =  '  £(£) 

Pe_Sp 

Proof:  Let  be  the  set  of  classical  probability  functions  j?  defined 

on  the  atoms  of  9  such  that  for  every  X  c  9,  3el (X)  <_  J?(X)  £  l~?_e£(X)  • 

S  is  closed,  since  _P(X)  =  Bel ( X)  ,  £(X)  =  1-Bel ( X)  is  a  classical 

probability  function.  S_^  is  convex,  since  for  0  <  a  <1,  aP^(X)  +  ( 1-ci)  r_2 

lies  between  Bel (X)  and  1-Bel (X)  whenever  JP^(X)  and  P^^)  do.  Since 

there  is  a  PeS_  such  that  P(X)  =  Bel(X),  Bel(X)  >  P(X)  .  And  inf  P(X) 

- -P  - - - ~  PeSp -  PeSp - 

>  Bel(X)  since  this  inequality  holds  for  every  PsS . 


Theorem  2 

If  S.  is  a  closed  convex  sec  of  classical  probability  functions 
defined  over  the  atoms  of  9,  and  for  every  A,  E3  c  9,  inf  ,?(AUB) 

>_  inf  P^(A)  +  inf  j?(3)  -  inf  P_(ATT)  ,  Chen  there  is  a  mass  function  m 

defined  over  Che  subsets  of  9  such  that  for  every  X  in  9,  the 
corresponding  Bel  function  satisfies 


Eel  (X)  =  P(X) 

-  .  ^ - 


Prjo! 


Since  S_^  is  closed  and  convex,  for  every  Xc3  there  is 


a  ?-:S  such  that  P(X)  =  P(X)  .  For  everv  Xc9  define  P*(X)  to  be 

- ?  -  -  '  ~  P 


3y  Shafer’s  Theorem  2.1,  if  9  is  a  frame  of  discernment  then  a 
function  3el  2'-  ,0,1,  is’ a  belief  function  if  and  only  if 

(1)  Bel  (£)  =  0  ?*(0)  =  0 

(2)  Bel  (9)  =  1  P*(-)  =  1 

(3)  For  every  positive  integer  n.  and  every  collection  Aj_ ,  .  .  . 

of  subsets  of  *3, 


Bel  (A  - 4n) 


(-1)  ~ 


ll+l 


,n  ; 


Bel ( n  A  ) 

iel  - 


Since  Shafer's  theorem  2.2  gives  an  algorithm  to  recapture  the  mass 
function  from  the  belief  function,  we  need  merely  extablish  (3) 


D 


(X) . 
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•  ,A_ ,  of 


I 

(3’)  ®  (A,o..JV)  >  y  (-1)  P,c(  n  A.) 

^  _1  ^  'icU . n} 


Suppose  (3')  fails.  Then  there  is  a  collection  A^,.. 
smallest  cardinality  n,  for  which  (3')  is  false.  I.e., 


P  ,  (A,  u • • • uA  )  <  } 

"*  _1  IC(l,...,n) 


iU+1 

(-D  p*(n  a  ) 

iel  - 


But 


P*(Alu.  .  •  cAn)  >  P^Aj)  +  P*(Alu.  •  *  tAn_1)  “  P*(Alt;.  ..  tA  JnA^) 


by  the  hypothesis  of  the  theorem. 

P*((A1u...uAn_1)nAn)  -  £*((^1  A^uCA^A^)  f.  . -u  (A^n  Ar)) 

By  hypothesis,  (3')  holds  for  collections  of  cardinality  of  (n-1) . 


(4) 


Thus  P  *  ( (A,  n  A  )l(AkiA  )  l(A  ,n  A  ))  > 
~  “n  H.  “O.-1  “n 


(-l)'i'.4-1  P. 


Ic{l, . . . ,n-l ; 


i  r+i 


(5)  and  P.(A,  u...u\  ,)>  l  (-1)  ~  ?^(  f\  A  • ) 

1  “  led,..., n-1}  ~  re¬ 


compute  l  (-1)  !-!+1  _P^(  P^_a  .)  : 

_IC{  1 ,  .  .  .  ,n  } 

We  evaluate  the  sum  by  cases:  j_I;  =  1,  j_I  >  1  and  n  i  and  |I_; 
and  n  e  I_. 

|  r  I  +1 

i  I  ;  =  1  :  ? .  (A  )  +  7  (-1)  ~  Pjn  A  • ) 

-  I  cd,  .  .  .  ,n-l ;  -  i 


;  I  =1 


T  +1- 


r  i 

HI  »!■ 


(-1)  ~  ?*(  A  A.)  =  ? A ( A  )  +  '  ?.  (A. 

— ~  i  r  i  —  ~n  . u  — *  — l 

:  •  j:  - 1-  —  L  <  P.  — 


(  D  A  n  A  ) 
iel  -  2. 


1 


jli  >  1,  /_  (-D  !ii+1  A,) 


J_-\  l, .  .  .  ,n-l } 


iel  "i. 


jl|  >  1,  r  {1 . n-l):  I  =  I'  l' c ( 1 ,  •  •  • , n-1 } 


l  (- 1)^':+2  IJQti 

V  ={1,  .  .  •  ,0.-1} 

=  )  (-D;i!+I2*(LQi  V 

tc{l,...,n}  - 

nel,  li!  >1 


Combining  the  three  terms,  we  have, 


(,(i  !  I  <-»lil+1  £„</\  »!>  -  l,  <-D|i|+l  L.tOjii' 

-  I c  { 1 ,  .  .  .  ,  n }  ~  -1-  " 

*T*"  1  -r 


i  i !  >  i 


I  i  >1 


+  l  (-n  ’-i+1  pa<  n  \  > 

i={l,  • .  •  ,n}  it  I.  — 

n-L 

ill  >1 


(-D 


I  i+l 


p*(Aa.) 

—  *  Lt.I  l 


These  two  theorems  show  that  the  representation  of  uncertain  knowledge 

provided  by  Shafer's  probability  mass  functions  is  exactly  equivalent 

to  a  representation  provided  by  a  convex  set  ot  classical  probability 

functions,  and  that  the  representat ion  of  uncertain  knowledge  by  a  convex 

set  of  classical  probability  functions  is  exactly  equivalent  to  a 

r e p r e s en t a t  ion  provided  bv  a  peccability  mass  function  s.  long  as  the 

convex  set  of  probability  functions  satisties  tn=  genera,  relation 

P  (AU3)  >  P  (A)  +  P  (3)  -  ? (AnB )  . 

*  * 


i: 


6.  The  main  theorem  of  this  section  gives  the  relation 

between  convex  Bayesian  updating  and  Dempster/Shafer  updating. 


To  establish  the  theorem  requires  two  reductions.  These  are 
given  by  two  lemmas.  The  first  provides  an  algorithm  for  computing 
the  result  of  Denpster/Shaf er  updating  in  response  to  uncertain 
evidence;  the  second  does  the  same  thing  for  Bayesian  updating. 

Lemma  1  :  Let  9  be  a  frame  of  discernment.  Let  our  initial  belief 
function  by  Bel^ .  We  obtain  new  evidence  whose  impact  on  the  frame 
of  discernment  9  can  be  represented  by  a  simple  support  function 
(Shafer  1976,  p.  7)  Bel^  whose  single  focus  is  Cc2  .  Bel^  attributes 
mass  _s  to  and  mass  (1-s)  to  9. 

Let  the  foci  of  Bel^  -  the  subsets  A  of  9  receiving  mass  m^(A)>0 
be  A^ ,  A^ , • • • ,  A^.  We  can  construct  a  new  frame  of  discernment  9'  and 
a  new  belief  function  Bel^ ,  such  that 

(a)  For  every  Xc9,  Be  1 ^ ( X )  =  3e 1 ^ ( X ) 

(b)  For  every  Xc0,  (Bel  eBelc)(X)  =  Bel|(x[E) ,  where  Es29  ,  and 

the  evidence  partially  supporting  C  provides  total  support  for 
E. •  "®"  represents  the  application  of  Dempster's  rule  of  com¬ 

bination  to  Be  1  ^  and  Bel^.;  Bel|(X,E)  represents  Dempster's 
rule  of  conditioning  on  £  -  the  analog  of  Bayesian  condition- 
alization  (Shafer  1976  p.  67). 
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Proof :  Let  e  be  new  co  0,  anc  for  every  g_£0  generate  two  new  "possibi¬ 
lities"  pe  and  ge_.  Let  Q'  -  { g_*  :  3pe0(g_'=£e  v  g_f  =g£)}.  Let  E_  *  {pj  :  3pr0(p' 
Since  the  evidence  that  supports  C  is  to  render  E  certain,  we  have  C'c^ 
i .  e  .  C_'  =  { g_'  :  3g_£C  (  gj  =  g£)  } . 

We  define  Bel^'  on  the  basis  of  m^ '  as  follows: 

Bel,'  has  n  foci  of  the  form  A.,  each  with  mass  (l-s)m, (A.),  where 
- i  —  - —  —  i  — _i_ 

is  the  mass  function  associated  with  Be  1^  . 

For  every  i  such  that  A.rC'  »  0,  A .  r  E  is  to  be  a  focus  with  mass 
—  i.  ~  ~~L  ~ 

s-m, (A.).  For  convenience  we  take  the  first  p  of  the  A.  to  be  those  for 

1  -i_  c  — i_ 

which  A.rC'  *  0.  Note  that  may  be  0,  but  cannot  be  ri,  else  Be  1  ^  $ 

Be  L„  would  be  undefined. 

The  remaining  _i  give  rise  to  the  remaining  foci.  These  are  of  the 
form  ( A^ ,~C ' )  j (A^ r.E) ,  and  receive  the  remaining  mass.  Since  (A^rC'Vj 

(A.  ,'E)  =  (A  ."C')  j  (A.'.E)  is  a  possibility  for  _i^i>  we  write 

—  If 

S1,((Ai1'C' )J(Air,E))  =»  Y. 

{j_:  (A~.'C'  )u(A  nE)  =  (A.rC’ )u(A.rE)} 

n 

Note  that  V  m  '  ((A.-'C'  )_  (A.rE) )  =  X!  n  (A.)*s,  since 

r-*-L  —i  -  —a  ,  —  1  — l  — 

—  —  i_=£+l 

these  sets  have  positive  mass  only  if  A.  'C_'=* 


0. 


We  first  show  that  Be  1  1  is  a  belief  function.  Obviously  its  mass 
function  m_'  is  non-negative  for  every  _Ac*3  *  ,  so  we  need  only  show  that 

Z  J3.'  (A)  =  1  •  Summing  over  the  three  kinds  of  foci,  we  have: 

A- 3 ' 


il  p  JJ 

l_,  o'  (A)  =»  Z  (l-s)m  .(A  )  +  Z§;m  (A  )  +  ^  £*m  (A  )  =  1. 

Ac9'  J_-l  -  i_=l  -  i  =  p+l  - 


We  next  show  that  Be  1^  *  is  equivalent  to  Bel^  -  i.e.  that  for  any 
Xc6,  Bel1 ' (X)  =  Bel  (X) . 

Bel.’(X)  =  Jo'(A)-  •  Z  m'  (A.)  +  £  m’  (A /Vi)  +  Z  m'  (( An  C)u  (Anl) ) 

1  AcX  A  <=X  ~ 1  —  “ 

—  -  ~L~  A  nEcX  (A.^C)  u(A.-  E)«=X 

— i -  —i  —  — x  — 


The  first  term  yields 


Z  (l-s)m  (A. )  =  (1-s)  Z  o,(A. ) 

Ac  X  ~  A.cX  1 


Since  _X  =  (Xr E)u  (Xr  E)  ,  A^1  E  o  XrE  if  and  only  if  A_5  X ,  in  view  of  the 
fact  that  pe  £  A.r_E  if  and  only  if  pt  ,  and  the  same  holds  for  _X.  Thus 
Che  second  term  yields  /  (A.) 

—  “l  — i 

V* 

lx  ii  p 
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To  evaluate  the  third  term,  we  claim  that  ( C )u  ( E)  c  X  if  and  only 

if  A ,  =X •  If  A.cX,  then  A.nC  cX  and  A.n  E  <=  X  and  so  (Aji  C)  j(As  E)  c  X. 
_  i  ±_  i  -—I  —  - 

Suppose  (Ajn  C)u  (A^  E)  c  X.  Then  A.n?  cX,  A.n  E  =  XnE,  and  by  the  prece¬ 
ding  argument  A^  X.  Thus  the  third  term  yields  ^ 

A  =  X  1  1 

p<  i^ri 

Putting  the  three  parts  together,  we  have  Bel^ ' (X)  =  Bel  (X) . 

We  now  show  that  conditioning  on  E_  in  the  frame  of  discernment  9  ' 
is  equivalent  to  combining  uncertain  evidence  C_  with  BeJ^  in  the  frame 
of  discernment  9  according  to  Dempster's  rule  of  combination: 

For  every  fc0,  (Bel^Bel  Xx)  =  Bel  '  (X!  E) 


(1)  (Bel1*Belc)(X) 


E  +  E  n1(A1)(l-s; 

A^CzX  A^X 

1  ‘  I  W*4 

A . n C=0 


(The  numerator  comprises  two  sums,  since  Be  1^  has  two  foci:  C  and 
with  masses  _s  and  (1-s^  respectively.) 


I  S1’(A)  -  y£,'(A) 

-  >. 

A  r  v  r  .  -  ■=• 


(2)  Bel  '(X  E)  = 


I  -  V  m. ' (A) 

t—  i  — 
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m,  '  (A) 

AcE 


P 

X]  s*m.  (A.),  since  only  Che  foci  of  the  fora  A  n£ 
i  =  l  --1  1  1  “ 


are  included  in  E:  A.  =*  (AnE)u(A.nE)  is  noC  included  in  E,  and 
—  — 1_  ~l_  _  — jl,  — ’ 

since  C/cE.,  ( A^  rC '  )  u( A^nE)  is  included  in  E_  only  if  A.r.  C_'  =0 ,  in 
which  case  ic  has  no  mass. 


T.  ®i(A.)‘s 
a  nC'=0  1  "i  ' 


P 

V” 

L .  s.*a  (A.). 
i=L  - 


Hence  Che  denominacors  of  (1)  and  (2)  are  Che  same. 


Ic  remains  co  evaluaCe  ^  _  m 

A=X.nE  1 

A  •  A.cXuE  if  and  only  if  A.=X, 
-H  — i — x  -j 


'(.A).  Consider  foci  of  Che  fora 
so  these  foci  yield  mass 


T.  i'  (A.)  -  E  (l-s)o  (A.) 
A.cX  —  A.r  X  — 


corresponding  Co  Che  righc  hand  Cera  in  che  numeracor  of  (1). 


Consider  foci  of  Che  fora  A.rE.  All  of  chese  are  included  in 

~ 

XuE;  chey  yield 

P  _P 

T.  m'  (A.)  =  I  s*m,  (A.)  =  I]  m  ’  (A)  , 
i=l  ^  i  =  l  L  ^  AcE  1 


so  Chey  drop  out  of  Che  numerator  of  (2). 


T  - 


Finally,  consider  foci  of  che  form  (A^r  C_' )u  (A.nEJ  .  We  first  show 
that  (A^fiC  )u(A^nE_)  c  if  and  only  if  A^c  q*  c  X .  Suppose 

(^nC' )u(Air,E)  c  Xu?.  Then  A^'C'  =  (XbEJ.  But  C*=  E,  so  \r,  C'  = 
A ^ nC  nE  —  2^^  only  if  A.cC1  ■-  22*  Suppose  A.n  C T  c  X.  Then  since 
A  rYE  c  E  c  XuE,  ( A .  ' C '  )  u  (  A  rl)  c  XjE. 

-  —  —  — x  -  — i  - 

We  compute  the  mass  in  the  numerator  of  (2)  due  to  foci  of 
this  sort.  They  have  mass  only  when  A£  C' tQ .  And  then  they  have 
mass 

I  ^i(V  > 

{j_:  (A^r.C'  )u(A  -E)  =  (A.rC^uCA.rE)} 

each  A.  such  chat  A.  C'  c  X  contributes  s  *  n  (A  ).  Their  total  mass 
~x  ~i  ~  - i  “j. 

is  therefore 

Y.  s_*n.  (A  )  , 

A.nC’  c  X  'l  “i 

A  r.C*  f*0 

corresponding  to  the  first  term  of  the  numerator  of  (1). 

We  have  therefore  shown  that  ( 3e l^$3el  ) (X) =Sel 1 '(M  E)  . 


Two  remarks  on  this  construction  are  in  order. 


First,  we  have 


given  no  rule  for  finding  the  ''possibility''  E_.  But  in  general  that 
should  be  no  problem.  Suppose  C  is  the  proposition  that  there  is  a 
squirrel  on  the  roof  of  the  barn.  The  light  is  bad,  so  Be  1  assigns  a 
aass  of  only  .8  to  £,  and  assigns  the  remaining  mass  to  0.  We  take  _E 
in  3'  to  be  the  proposition  that  it  seems  (.8)  to  be  the  case  that  there 
is  a  squirrel  on  the  roof,  for  which  the  evidence  is  conclusive.  The 
index  0.8  indicates  the  force  of  the  seeming,  and  is  reflected  in  our 
assignment  of  masses  in  9  ' .  In  many  situations  it  seems  quite  natural 
to  replace  "uncertain  evidence"  by  the  "certain"  data  on  which  it  is 
based . 

Second,  however,  whether  or  not  we  can  alwavs  do  this  is  unimpor¬ 
tant  for  the  comparison  of  Bayesian  and  Dempster  conditioning.  We  can 
regard  the  introduction  of  E_  to  be  merely  a  computational  device  that 
helps  us  to  compare  the  distribution  of  masses  in  0  according  to  the 
function  Bel^Bel^  to  the  corresponding  set  Bayesian  conditional  dis¬ 
tributions  . 

We  now  present  an  analogous  Lemma  for  Bayesian  condi t ionali- 
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zation  based  on  Jeffrey's  rule  for  uncertain  evidence. 

Lemma  2 . 

Suppose  that  is  our  original  assignment  of  probabilities  to  the 
field  F_  of  propositions  whose  basis  is  _a  ^  ,  •  •  *_an  •  As  a  result  of 
stimulation  of  our  sense  organs,  or  unreliable  observation,  we  shift  our 


1? 


probability  assigned  to  A  from  P^(A)  to  _P^(A).  By  Jeffrey's  Rule,  for 

X£F_, 

Z^(V  =  ^(xjAj-P^A;  +  ^(xjAJ-^fA) 

Let  us  add  a  new  atomic  proposition  e_  to  the  basis  of  _F  to  obtain 
the  field  ,  and  represent  it  by  _E .  We  impose  the  constraint  (Aj_E) 
*  j^(A);  P^  (£)  may  have  any  value  that  strikes  our  fancy. 

We  extend  P!  so  that  for  any  XeF,  P!  (X)  =  P  (X);  P'  is  fully 
equivalent  to  ,  so  far  as  F_  is  concerned,  before  we  obtain  information 
about  A.  Specifically,  set 


P(A)  P  (AJ  1-P  (A)  l'k.  L(A) 

k  *  -  J  -  =  — 1 3  - - 

^(a)  ’  "  ^(1)  i-PqCa;  i-^caj 

For  XyF.'  ,  set 

P^CKae;  =  (E)  -  (k  P^X'A)  +  k'  P0(X'A)I 

<X*I)  =  P^CXJ  -  P^  (XA£) 

Clearly,  for  XeE, 

E^  (20  =  (XAE)  +  P^  (XA£)  =  P^X) 

We  now  show  that  for  Xs£,  probabilities  conditional  on  E_  are  equal 
to  the  probabilities  given  by  Jeffrey's  rule:  P^ (X)  =  P^  (X|£). 

PqU^E) 

For  £e£,  P'  (£  SJ  =  - 

P^  (E)  •  Ik  P^(X'A)  +  k '  P^X'A)] 

P^  (E) 

P-(X'A)  P^(X'A) 


The  same  remarks  may  be  made  with  regard  Co  this  construction 
as  were  made  with  regard  to  the  previous  one.  Although  we  haven't 
given  a  rule  for  specifying  _E,  it  shouldn’t  be  too  hard  in  most  circum¬ 
stances  to  come  up  with  a  plausible  and  in  any  event  we  can  construe 
the  construction  as  a  computational  device  to  make  it  easier  to  compare 
Dempster  conditioning  and  Bayesian  conditionalization. 

The  following  theorem  shows  that  in  the  case  of  certain  evidence, 
Dempster /Shafer  updating  yields  narrower  probability  intervals  than 
does  Bayesian  updating.  The  next  theorem  employes  Lemmas  2  and  3 
to  show  chat  this  relation  holds  in  general,  and  not  only  when  our 
evidence  is  certain. 


Theorem  3: 

Let  S  be  a  frame  of  discernment,  Bel  a  belief  function,  and  _Sp  the 
corresponding  set  of  Ba’esian  probability  functions.  Let  3_  be 
evidence  assigned  probability  1,  or  support  1.  Then  for  Ac  8  , 

inf  P ( A  B )  <  Be  1(  A;  B)  <  P_*(A_!  B)  <  su£  P_( A j  B ) 

feS  PcS„ 

~I  — P. 

where  P*(A|jJ)  =  1  -  Bel  (A ;  3)  is  Shafer's  plausibility  function. 
Proof:  (All  infima  and  suorena  are  taken  over  .) 

-  -g_ 


inf  P(AjB) 

inf  (A.-B) 

inf (A~B)  + 

sup ( A " B ) 

sup  P ( A  3 ) 

Sup(A'B) 

sup(A’B)  + 

inf (A'B) 

Be  1  ( A  3) 

Bel(A.f)  - 

Be_L(  ?) 

1-Be 1( 

I) 

P* ( A  3) 

_P_*(A-S) 

l-Be  1( 

P*  ( B )  i-Bel(S) 


By  corapucat  ions  f  com  cable  I  of  Che  appendix,  we  obcain: 


inf  P(A|B) 


(W^l3*h3*h^(h23%  34^34’^ 


Bel(Aj B) 


$1*^, 2^14^124) 

(Xl>!<3)*(X13^23^4)4(S123*5.i344X234)«e.[X124X14+X|24| 


P*(A  B) 


_ 2*~ 1 3*~14  ~123*~124+— t  34  g 

(x1-x3H<xi2*x13+Xi4Mxl234ii24tx134)«9+[x234X344S234l 


From  which  Che  inequalities  easily  follow. 
Corollarv: 


(1) 

inf  P(A  ;_B) 

=  Be  1  ( A  I B ) 

iff 

~1  2+~l  4+^234  =  ° 

(2) 

BeKA  1 B)  = 

p*(a|b) 

iff 

~13+— 123+— 134+—  =° 

(3) 

sup  _P(  A  [B) 

=  p*(a[b) 

iff 

— 23+— 34+— 234  =  0 

Theorem  4:  If  we  apply  Dempster's  rule  of  combination  to  any  evidence 
represented  by  a  separable  support  function  (our  initial  state  need  not 
be  so  represented)  we  obtain  constraints  more  severe  than  those  we  get 
from  Bayesian  condit ionalizat ion  applied  to  the  same  initial  state." 
Proof :  A  seperable  support  function  may  be  represented  as  the  combina¬ 
tion  of  simple  support  functions.  By  Lemma  1,  the  effect  of  a  simple 
support  function  can  be  represented  by  Dempster  conditioning.  By 
Theorem  2,  the  initial  state  can  be  represented  by  a  closed  convex  set 
of  Bayesian  probability  functions.  By  Lemma  9  the  effect  of  uncertain 
evidence  (as  reflected  by  a  simple  support  function)  can  be  represented 


by  Bavesian  conditionalizat ion . 


By  Theorem  3  the  belief  intervals  re- 


suiting  from  Bayesian  cond it ional izat ion  will  include  the  belief  inter¬ 
vals  obtained  from  Dempster  conditioning.  Therefore  the  result  of 
applying  Dempster's  rule  of  conditioning  will  lead  to  belief  intervals 
more  severely  constrained  than  the  convex  Bayesian  intervals  corresponding 
to  them. 

8.  Dempster /Shafer  evidential  updating,  we  have  seen,  leads  to 

more  tightly  constrained  representations  of  rational  belief  than 
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does  convex  Bayesian  updating.  It  might  be  thought  that  this  is 
a  virtue.  But  whether  or  not  this  is  a  Good  Thing  is  open  to  question. 

Suppose  that  D  =  D  ,...,D  are  alternative  decisions  open  to 
you,  and  that  you  have  a  utility  function  defined  over  the  cross 
product  of  D  and  the  set  0  of  possible  states.  You  begin  with  a  belief 
function,  and  you  obtain  some  evidence.  If  you  combine  this  evidence 
with  your  initial  belief  function  according  to  convex  Bayesian 
cond it ional izat ion ,  your  new  beliefs  will  Le  characterized  by  a  set 
of  probability  functions  _Pg .  If  you  perform  the  combination  of  evidence 
according  to  non-Bayesian  procedures,  your  new  beliefs  will  be  character¬ 
ized  by  a  set  of  probability  functions  P^,  that  is  (in  general)  a  proper 
subset  of  Pg.  ' 

Given  any  probability  function  P  in  either  _Pg  of  _? ^ ,  you  can 
calculate  the  expected  value  of  each  decision:  _E(D^  ,_P)  .  Let  :.s 
say  that  D.  is  admissible  relative  to  a  set  of  probability  functions 
just  in  case  there  is  some  probability  function  in  the  set  according 
to  which  the  expected  value  of  _D  is  at  least  as  great  as  the  expected 


value  of  any  ocher  decision.  Since  P  is  included  in  P  ,  the  admissible 

— K  -B 

decisions  we  obcain  if  we  update  in  a  non-Bayesian  way  are  included 

among  those  we  obtain  if  we  update  in  a  Bayesian  way. 

There  are  three  cases  to  consider.  (1)  We  obtain  the  same  set 

of  admissible  decisions  by  either  updating  procedure.  In  this  case 

we  have  gained  nothing.  (2)  If  ?  leads  to  a  set  of  admissible 

decisions  containing  more  than  one  member,  then  so  does  P  ,  and  we 

— B 

must  in  either  case  invoke  additional  constraints  in  order  to  generate 

a  unicue  decision.  (3)  If  P  leads  to  a  unicue  admissible  decision 

— N 

and  P  does  not,  we  appear  to  have  accomplished  something  useful 
— B 

by  means  of  non-Bayesian  updating. 

But  it  is  open  to  question  whether  the  added  power  should  be 

built  into  the  evidential  updating  rule,  or  whether  it  should  appear 
as  part  of  a  decision  procedure  that  takes  us  beyond  the  evidence. 

Many  people  feel  that  principles  of  evidence  and  principles  of  decision 
should  be  kept  distinct. 

Consider  an  urn  filled  with  black  and  white  iron  balls,  some  of  which 
are  magnetized  and  some  of  which  are  not.  It  is  easy  to  imagine  that 
by  extensive  sampling,  or  by  word  of  the  manufacturer,  our  statistical 
knowledge  about  the  contents  of  the  urn  may  be  as  represented  in  table  II 
of  the  appendix,  where  the  set  of  black  balls  is  represented  by  A,  and 
the  sec  of  magnetized  balls  is  represented  by  _B.  Given  that  this 
is  our  initial  state,  we  may  as<.  what  our  attitude  should  be  toward 
the  proposition  that  a  ball  selected  from  the  urn  is  magnetic,  given 
tout  it  is  white. 

Tenpster  conditioning  yields  the  degenerate  interval  .0.3,  0.3 


Bayesian  cond it ional izat ion  yields  the  interval  .0.5,  O.Sj 


Suppose  you  are  offered  a  ticket  for  $  .75  that  returns  a  dollar 


if  the  ball  is  magnetic.  On  the  view  identified  with  Dempster  and 
Shafer,  it  is  not  only  permissible,  but,  given  the  usual  utility 
function,  mandatory  to  buy  it.  On  the  convex  Bayesian  view  either 
accepting  or  rejecting  the  offer  would  be  admissible.  It  is  true 
that,  for  all  you  know,  the  true  expectation  is  positive;  but  it  is 
also  true,  for  all  you  know,  the  true  expectation  is  negative.  If 
every  thing  you  know  is  true,  the  expected  loss  may  still  be  $-.25. 

On  the  other  hand,  there  are  cases  where  Dempster's  rule  of 

combination  leads  to  intuitively  appealing  results,  but  the  convex 

9 

Bayes  approach  does  not.  Suppose  you  know  that  70°'  of  the  soft 
berries  in  a  certain  area  are  good  to  eat,  and  that  60%  of  the  red 
berries  are  good  to  eat.  What  are  the  chances  that  a  soft  red  berry 
is  good  to  eat?  The  rule  yields  .U2/ .5^  -  .78,  which  has  intuitive 
appeal.  But  the  set  of  distributions  compatible  with  the  conditions 
of  the  problem  leaves  the  probability  of  a  soft  red  berry  being  good 
to  eat  completely  undetermined:  it  is  the  entire  interval  [ 0 , 1 j  !  It 
is  possible  that  100%  of  the  soft  red  berries  are  good,  and  it  is 
possible  that  0%  of  the  sofc  red  berries  are  good. 

It  is  clear  that  in  applying  the  rule  of  combination,  we  are 
implicitly  constraining  the  set  of  (joint)  distributions  we  regard 
as  possible.  This  is  suggested  by  Shafer's  requirement  that  the  items 
of  evidence  to  be  combined  be  "distinct"  or  "independent”.  The  most 
natural  sufficient  condition  that  leads  to  the  same  result  as  Dempster's 
rule  of  combination  is  that  all  the  probability  functions  in  our  convex 
set  satisfy  the  three  conditions 


(i)  P(C)  =  h 

(ii)  PCS/C&R)  =  P(S/G) 

( iii)  PCS/G&R)  =P(S/G). 

Condition  (i),  of  coarse,  is  our  old  friend,  the  principle  of 
indifference.  Conditions  (ii)  and  (iii)  might  be  called  inverse 
conditional  independence,  and  it  is  not  hard  to  imagine  that  we 
have  warrent  for  supposing  they  are  satisfied. 

The  exact  necessary  and  sufficient  conditions  for  agreement  between 
the  two  methods  are  that  our  set  of  probability  functions  satisfy  one 
of  the  two  conditions 

(iv)  J?(G&R&S) /?  ( G&RSS)  =  P(G&R)  *P(G&S)  /P(G&R)  *P  (C&S) 

or  (v)  P (S/G&R) /P ( S/C)  =  P (G) /? (C)  *  P  (S/G*R) /?  (S/C) 

If  our  evidence  is  statistical  in  character,  it  clearly  behooves 
us  to  unpack  the  statistical  assumptions  underlying  our  employment 
of  non-Bayesian  updating  procedures.  But  what  if  our  evidence  is 
not  statistical  in  character? 


One  plausible  response  is  that  Dempster's  rule  of  combination 
is  not  designed  for  all  cases  in  which  you  have  statistical  data  to 
serve  as  input.  Sometimes  the  masses  in  the  belief  function  are 
determined  by  frequencies,  and  sometimes  they  are  not;  only  when  they 
are  not  determined  by  frequencies  should  we  apply  non-Bayesian  updating. 
It  is  difficult  to  make  a  case  against  this  response  except  by  making 
a  case  for  the  claim  that  all  responsible  ar.i  useful  probabilities, 
even  very  vague  one,  are  based  or.  statistical  knowledge.  But  then 
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we  must  also  face  the  problem  of  how  to  treat  evidence  which  is  mixed 
—  which  contains  both  statistical  components  and  intuitive  components. 
While  it  is  a  theorem  that  Dempster  combination  is  both  commutative 
and  associative,  and  also  a  theorem  that  Bayesian  combination  is  both 
commutative  and  associative,  it  is  obviously  not  the  case  that  a 
mixture  of  Dempster  and  Bayesian  methods  need  be  commutative  and 
assoc iat ive . 

It  should  be  strongly  emphasized  that  the  present  arguments  are 
not  intended  as  arguments  in  favor  of  the  general  applicability  of  convex 
Bayesian  conditionali2ation.  Rather,  what  I  have  shown  is  (1)  that 
the  representation  of  belief  states  by  distributions  of  masses  over 
subsets  of  a  set  9  of  possibilities  is  a  special  case  of  the  convex 
Bayesian  representation  in  terms  of  simple  classical  probabilities 
over  the  atoms  of  9,  (2)  that  the  treatments  of  uncertain  evidence 
in  both  Bayesian  and  non-Bayesian  updating  are  reducible  to  the  corres¬ 
ponding  treatments  of  certain  evidence,  and  (3)  that  non-Bayesian 
updating  yields  more  determinate  belief  states  as  outcomes,  but  that 
the  benefits  afforded  by  non-Bayesian  updating  are  limited  and  questionable. 


2  7 


able  1 


Atomic  propositions  A, 3 


Mass 

Lower  Measure 

Upper  Measure 

© 

A3 

X1 

X1 

l'X2~X3'X4"X23"X24'X34'X234 

© 

AB 

X2 

X2 

L-Xl'X3_X4_X13'X14"X34'  X124 

© 

AB 

X3 

X3 

i-x1-x2-x4-xL2-x14-:<24-x124 

© 

AB 

X4 

X4 

1_Xl~X2"X3"X12'X13'X23~Xi23 

©  u© 

X12 

Xl+X2+X12 

1-X  -X  -X 

3  4  34 

©  u(3 ) 

X13 

Xl+X3+X13 

1-X2'X4-X24 

©  u© 

X14 

VVX14 

1-X2-X3-X23 

©  u© 

X23 

X2+X3+X23 

1-X1~X4~X14 

©  u© 

X24 

WX24 

1_X3"Xl"X13 

©  u  © 

X34 

X3+X4+X34 

1-xrx2-x!2 

©  u©  u© 

X123 

Xl+X2+X3+X12+X13+X23+X12  3 

1-X4 

©  u  ©  u  © 

X124 

Xl+X2+X4+X12+X14+X24+X124 

1-X3 

©  J  ©  u  © 

X134 

X1+X3+X4+X13+X!4+X34+XI34 

i-x2 

©  0  ©  0  © 

X234 

X2+X3+X4+X23+X24+X34+X234 

l'X! 

6 

X. 

1 

1 

X,  =  1-EX. 
o  1 
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Table  II 


A:  white 

B:  magnetic 

Mass 

Frequency 

X 

1 

0.2 

[0.2, 0.4] 

X2 

0.2 

[0.2, 0.4] 

X3 

0.1 

[0.1,0.2] 

X4 

0.2 

[0.2,0. 5] 

V 

12 

0.1 

[0.4, 0.7] 

X13 

0.0 

[  0 . 2 , 0 . 5  ] 

XU 

0.1 

[0.4, 0.7] 

X23 

0.0 

[0.3, 0.5] 

X24 

0.1 

[0.4,0. 7] 

X34 

0.0 

[0.3, 0.5] 

X123 

C.O 

[0.6, 0.3] 

X124 

0.0 

[0.8, 0.9] 

X134 

0.0 

[  0 . 6 , 0 . 3  ] 

X234 

0.0 

[0.6, 0.3] 

e 

0.0 

11.0,1.0] 

1.  This  approach  is  similar  Co  that  of  Smith  (1961).  It  is  also  similar 
to  the  approach  of  Levi  (1974,  1981)  Good  (1962),  and  Kyburg  (1974),  but 
as  Levi  points  out  in  (1981)  there  are  important  differences.  Levi 
represents  a  credal  state  by  a  closed  convex  set  of  conditional  probability 
functions.  Since  distinct  closed  convex  sets  of  conditional  probability 
functions  give  rise  to  the  same  closed  convex  sets  of  simple  probability 
functions  (probabilities  conditional  on  tautological  evidence),  the  two 
representations  are  not  equivalent.  Smith  and  Kyburg  represent  a  credal 
state  by  the  convex  closure  of  all  probabilities  consistent  with  a  set  of 
probability  intervals.  Shafer,  as  will  be  seen,  implicitly  offers  the 
same  characterization.  Dempster  (1968)  offers  a  more  restricted  character¬ 
ization:  the  convex  set  representing  the  credal  state  is  the  largest  that 
both  satisfies  the  interval  constraints,  and  can  be  obtained  from  a  space 
of  "simple  joint  propositions"  in  a  certain  way.  Levi  has  shown  (1981, 

pp.  338-392)  that  these  additional  restrictions  are  incompatible  with 
certain  natural  forms  of  direct  inference  of  probabilities  from  known 
star  is  t  ics . 

2.  In  another  place  I  shall  argue  that  we  can  found  all  our  probabilities 
on  direct  or  indirect  statistical  inference,  or  on  set-theoretical  truths. 
No  other  source  is  needed. 

3.  An  example  suggested  in  conversation  by  Teddy  Seidenfeld  is  this: 
consider  a  compound  ex per iment  consisting  of  either  tossing  a  fair  coin 
twice,  or  drawing  a  coin  from,  a  bag  containing  4  37  double  headed  and 

60'.  double  tailed  coins.  The  two  parts  o'  the  compound  are  performed  in 
an  unknown  ratio.  Let  A  be  the  event  that  the  first  toss  lands  heads  and 
S  the  event  that  the  second  toss  lands  heads.  The  representation  by  a 


convex  sec  of  probability  functions  is  straight-forward,  but 

P* (A  B)  =  0.75  <  0.9  =  P*(A)  +  P*(3)  -  ?*(A  B)  =  0.4  +  0.5  -  0.0 
By  theorem  2.1  of  Shafer  1976,  P*  is  therefore  not  a  belief  function. 

It  is  possible  to  compute  a  mass  function,  but  the  masses  assigned  to  the 
union  of  any  three  atoms  must  be  negative. 

4.  This  result  was  stated  informally  by  Levi  (1967). 

5.  Dempster  (1967,  1968)  was  well  aware  that  his  rule  of  combination 
led  to  results  stronger  than  those  that  would  be  given  by  a  mere 
generalization  of  Bayesian  inference.  His  reasons  for  preferring  the 
rule  at  which  he  arrives  are  essentially  philosophical:  in  a  classical 
Bayesian  framework,  unless  you  restrict  the  family  of  priors,  you  don’t 
get  useful  results  starting  with  0  information.  But  in  expert  systems, 
we  have  no  desire  or  need  to  start  with  zero  information. 

6.  Quinlan's  (1982)  subtitle  suggests  the  opposite:  ”A  cautious 
approach  to  uncertain  inference.” 

7.  It  is  not  clear  that  Shafer's  belief  functions  were  intended  to  be 
used  in  a  decision-theoretic  context.  Even  if  they  were,  there  would 
be  serious  difficulties  standing  in  the  way  of  such  employment.  (See 
Levi  (197S,  1930,  1983),  and  Seidenfeld  (1973)).  For  present  purposes, 
these  difficulties  need  not  concern  us. 


;is  corresponds  to  Levi' 


L  ?  -3 1 ) 


maim  iss  ib  il  is 


3. 


s 


9.  This  elegant  and  simple  example  was  proposed  by  Jerry  Feldman. 
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